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The problem of interaction in multidimensional contingency tables is investigated from the 
viewpoint of information theory as developed by Kullback. The hypothesis of no rth-order interaction 
is defined in the sense of an hypothesis of "generalized" independence of classifications with fixed 
rth order marginal restraints. For a three-way table, with given cell probabilities 77^, the minimum 
discrimination information for a contingency table with marginals p,j. , p./*, and p,.A is given by the 
set of cell probabilities Py* = aijbjkCik^ijk where ay, 6ja> and c,k are functions of the given marginal 
probabilities, that is, In (pfjj7Tijk) = ^n a,j + ln bjk + \n c,a-, representing no second-order interaction. 
The minimum discrimination information statistic, asymptotically distributed as x 2 with appropriate 
degrees of freedom is 

2lijkXijk In x ijk - 2lijkX ijk In xf jk ^ 

where *y* are the observed cell frequencies and x* jk . are the "no interaction" cell frequencies uniquely 
determined by a simple convergent iteration process of the marginals on 7t,ja. For lower order marginal 
restraints the usual independence hypotheses are generated when 7T/ja are taken to be the cell proba- 
bilities under uniform distribution. It is shown that the set pf jk satisfies definitions of no second order 
interaction in a 2X2X2 table given by Bartlett and no interaction in a r X 5 X t table by Roy and 
Kastenbaum, and is also related to that given by Good. Results of application to the analysis of some 
"classical" three-dimensional contingency tables are given, together with full details for two four- 
dimensional examples. 

Key Words: Contingency tables; estimation of cell frequencies from marginals; generalized inde- 
pendence; hypothesis testing; information theory; interaction; second-order inter- 
action. 

Introduction 

In the last decade, a series of related papers appeared in various publications on the analysis 
of multiway contingency tables. The topic of particular interest was that of the definition and 
treatment of higher-order interaction. Among these papers, we cite, for instance, Roy and Kasten- 
baum [1956],'* Plackett [1962], Darroch [1962], Birch [1963], Good [1963], and Goodman [1964a, 



1 Presented by title to the Institute of Mathematical Statistics, Columbus, Ohio 23-25 March 1967 under the title Interaction in Multi-dimensional Contingency 
Tables; Abstract in Annals of Mathematical Statistics, Vol. 38, 297 (1967). 

2 Supported in part by the Air Force Office of Scientific Research, Office of Aerospace Research, United States Air Force, under Grant AF-AFOSR 932-65. 
•' Figures in brackets indicate the literature references at the end of this paper. 
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1964b]. Also related were The Estimation of Probabilities by Good [1965], two reports by Bhapkar 
and Koch [1965, 1966], and their recent publication in Technometrics [1968]. 

In these papers, various aspects of the problem were treated and solutions offered. Conse- 
quently, results given are usually concerned with individual parts of the problem, depending on 
the author's motivation and interest. We propose to show, that some of these results can be unified 
through the use of information theory. The formulation, comparisons, and proofs will be given in 
section 2, and applications to the analysis of multidimensional contingency tables in section 3, 
following a historical review in section 1. First, however, we shall give a brief description of the 
problem and the notations that will be used. 

Consider a random sample of n independent observations, where each observation can be 
classified by m criteria of classifications, say: row (R), column (C), and depth (D), for instance. 
Suppose there are r categories in the row classification, c categories in the column classification, 
and d categories in the depth classification, then there are exactly rX c X d cells in the three dimen- 
sional table. Each observation by definition must fall in one of the cells, say, the ijkth cell, with 
probability p ijk , where j=l, 2, . . ., r,j= 1, 2, . . .,c, A=l,2, . . ., d, are the subscripts relating 
the cell to the categories in the row, column, and depth classifications respectively. 

Let Xijk represent the observed frequency of the ijkth cell in the sample, so that XykXyk^n. 
Then we have summarized the data in the form of a three-way contingency table with cell proba- 
bilities ptjk, XijkPijk= 1. 

Summing over categories over one classification or two classifications, we obtain two-way 
and one-way marginal tables respectively, or in symbols: 

X ij . — Z kX ijk i X).. = ZjXjj. = XjkX jjk , 

and the like. Corresponding to these marginals frequency tables, we have similar marginal tables 
for the probabilities. 

In the analysis of contingency table we are usually interested in the relationship between 
one classification and one or more of the other classifications. Suppose the row classification 
represents the response of an experiment on animals, the column classification types of treatment, 
and the depth classification a distinguishable characteristic of the sampled individuals, sex, for 
instance. Then in many respects the hypotheses of interest are analogous to those of independence 
and correlation in normal multivariate analysis, e.g., 

1. Response is independent of treatment, or 

H () :p u .= pi..p.j.. 

This case corresponds to simple correlation. That is, Ho corresponds to the hypothesis that response 
and treatment are uncorrelated. 

2. Response is independent of treatment and sex, or 

Ho:pijk= Pu-p.jk. 

This case corresponds to multiple correlation. 

3. Response is independent of treatment, given the sex, or 

no • Pijk — 

P-.A- 

This case corresponds to partial correlation. 

Of course not all contingency tables can be interpreted in such a straightforward manner. 
In some cases all three classifications can be considered as responses; then we may be interested 
in the independence or associations among these responses. In other cases a classification may 
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be viewed either as a factor or a response. For convenience, we may group all the concepts of 
association, dependence, etc., under the general term of interaction. No interaction between 
treatment and sex appears to be a more acceptable phrase than independence between treatment 
and sex since independence is usually reserved to express the relationship between two random 
variables. We may also say that the interaction between response and treatment does not interact 
with sex, meaning the degree of association between response and treatment is the same for both 
sexes. 

Here we come to grips with a concept which gives rise to the idea of second-order interaction 
(sometimes called a three-factor interaction when applied to a three-way table). The definition, 
method of analysis, and the interpretation of the second-order and higher-order interactions have 
been the source of controversy for a number of years. It is probably worthwhile to review the 
historical development of the problem and to summarize the various schools of thought on the 
subject to gain a proper perspective for the present treatment. A brief account will be given in 
the next section. 

1. Historical Background. 

1.1. Formulation of the No-Interaction Hypothesis 

The first use of the "no second-order interaction" hypothesis as relating to a 2X2X2 con- 
tingency table was due to Bartlett [1935]. The concept remained dormant for a number of years. 
Lewis [1962], in his excellent review on the subject, lamented that ". . . there is still no coordi- 
nated information available, and the treatment of these tables is still widely neglected in standard 
text books." 

Bartlett's definition was mainly intuitive. Since then, there have been several attempts to 
arrive at a logical, consistent, and intuitively acceptable definition that could be derived from 
within a wider framework of hypothesis formulation. The main lines of thought can be grouped 
into the following classifications: 

1. Bartlett's original definition and its extensions. 

2. Simpson, Plackett, and Darroch's formulations based on symmetrical functions of the 
cell probabilities. 

3. Good's definition based on maximum entropy and Goodman's modification. 

1.2. Bartlett's Definition and Its Extension by Roy and Kastenbaum 

Bartlett defined his term, formulated the hypothesis, proposed the statistic and suggested 
a method for the solution in less than 25 lines! To use the author's words, "The testing of inde- 
pendence in a 2 X 2 table with fixed marginal totals, may be regarded as testing the significance 
of the interaction between the two classifications . . .. Corresponding to the hypothesis to be 
tested in an ordinary fourfold table (i.e., a 2X2 table) of observed numbers /?i, n> 2 , n*, n 4 that 
P\Pa = PiP.\, we require to test the hypothesis (of no second-order interaction) that 

(1.1) PlP4PfiP7 = P2P3P5P8." 

(i-e., PlllPl22P212P221 = Pll2Pl2lP21lP222.) 

Thus for a 2X2 table with fixed marginal totals, Bartlett's definition of no first-order interaction 
implies and is implied by independence of the two classifications. Furthermore, he assumed 
that the cross-product ratio type of hypothesis can be extended to define second-order interaction 
for 2X2X2 tables. It is remarkable that his definition remains the preferred one to this date and 
the same hypothesis has been arrived at by others through different approaches. 

Bartlett's definition, however, becomes complicated when the categories within a classification 
are more than two — a difficulty acknowledged by him in the latter part of his paper. Proper inter- 
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pretation of results of the test also becomes difficult. Moreover, the solution requires lengthy 
iterative computation for the solution of (r— 1) (c— 1) (d— 1) simultaneous third degree equations, 
where r, c, and d are respectively, the number of categories in the row (R), column (C), and depth (D) 
classifications. 

Norton [1945] extended Bartlett's definition to 2 X 2 X d tables, and devised an iterative pro- 
cedure for solving these systems of equations. Roy and Kastenbaum [1956], commenting that 
"Bartlett's and Norton's papers do not give any indication of the mechanism behind the formula 
for the hypothesis of no interaction . . .", derived a set of "no interaction constraints" in an 
r X c X d table in the form of 

1=1,2, . . .,(r-l) 

(1.2) PrcdP M = PrckPjJ !i for;=1?2 , . . .,(C-1) 

PicdPrjd PickPrjk jfc=l f 2, . . .Ad-l). 

The set of constraints reduces to (1.1) for a 2 X 2 X 2 table. 

The "mechanism" used by Roy and Kastenbaum is based on the fact that the two hypotheses 

Hiipi.k = Pi~P~k 

H 2 :pij.=Pi..p.j. 
will not usually imply 

H:p ijk = pi..p.jk 

in a three-way contingency table. The "no interaction" hypothesis is required to generate the 
set of constraints such that these constraints, when superimposed on H\ H Hj> should imply H. 
The result is the set of constraints in (1.2). In contrast to Bartlett, Roy and Kastenbaum called 
(1.2) the hypothesis of "no interaction" or "no first-order interaction." The extension of this 
concept to the hypothesis of "no second-order interaction" in a lour-way table was only indicated 
in their paper. 

1.3. Simpson, Plackett, and Darroch's Formulation 

Simpson [1951] required the definition of "no second-order interaction" to be symmetrical 
with respect to the three attributes of a 2 X 2 X 2 table. If some function i//(pm, P121, P211, P221) is 
chosen to measure the association of classifications R and C in Z), then the function must be such 
that the equation 

<Mpill,Pl21,P211,P22l) = */> (pi 12, Pl22, />212, P222) 

implies and is implied by the equations 

<Mpiii, P211, P112, P212) = i/>(pi2i, P221, P122, P222) 
and 

iMpill, Pl21, Pll2, P122)=<MP211, P221, P212, P222). 

He showed that the function \b= or the cross-product ratio used by Bartlett, satisfies this 

P111P221 

requirement. Hence, Bartlett's definition for a 2X2X2 table was accepted. The uniqueness of 

this function was not discussed. 

In a footnote to Simpson's paper, the editor suggested that "This paper should be read in 

conjunction with the following paper by H. O. Lancaster." Lancaster [1951] defined the second- 
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order interaction by the partition of the chi-square statistic X 2 ; i.e., it is defined as the difference 
between the total X 2 for testing complete independence of the three classifications, and the sum of 
the three components corresponding to tests for independence in each of the three marginal tables. 

Plackett [1962] compared Simpson's definition with Lancaster's definition [1951] and showed 
that the latter does not always satisfy the condition of symmetry. He accepted Roy and Kasten- 
baum's definition given in (1.2) for an rXcXd table, and extended the analysis of log-frequencies 
[Woolf 1955] to such tables as an alternative method of analysis which is computationally easier 
than the solution of (r— 1) (c— 1) (d — 1) simultaneous equations of the third degree. 

Darroch [1962] made an explicit comparison of the definitions of interaction in multiway con- 
tingency tables and in the analysis of variance. He found that there are resemblances between the 
two definitions but "that interactions in contingency tables enjoyed only a few of the fortuitously 
simple properties of interactions in the analysis of variance." The main point he made (also made 
by Roy and Kastenbaum) was that a natural symmetrical definition of "no second-order interaction" 

(1.3) Pijk = P^Ehl£lt 

p,..p.j.p.. k 

necessarily imposes constraints on the marginal probabilities py. , p.,A-, p,-.A, i.e., 

Zh-Pijk — Pij.— Zk , 

Pi..p.j. P-.A 

or 

Sa" -=Pi..p.j. 

p.-k 

for all /, /, and the like. This is of course undesirable since the condition for "no second-order 
interaction" should relate ptjk to any given set of marginal probabilities and should not place restric- 
tions on the latter [cf. p. 172 Kullback 1959]. 

Consequently Darroch defined a "perfect three-way table" as one for which condition (1.3), 
and the resulting restraints on the marginal probabilities, are satisfied exactly. He concluded 
further that "in imperfect tables it is not possible to express py* in terms of simple functions of 
Pij", Pi'k, and p.jA when there is no second-order interaction." The existence and uniqueness of 
the set ptjk as the solution of (1.2) for any given set of mutually consistent marginal probabilities 
was conjectured for rXcXd tables and proved for the 2X2X2 case. The search for a simple 
formulation in terms of parameters which are implicitly defined by the marginal probabilities led 
Darroch to define 



where 



and 



and showed that 



PUk = pa jk p k . i y ij 



SaO/a- = ^i/3ki = ^jjij = 1 , 



ILlijkCLjkfikfyij = 1 ; 



• i • P'J k n ■ Phk j • PU- 

fx=l, otjk = — — , /3a-/ = ~ — , and y„ = tLJ — 
P-j- P-k Pi.. 
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Since there is no solution in closed form to the maximum likelihood equations for the parameters 
under hypothesis of no second-order interaction, unless the observed table happens to be perfect, 
Darroch suggested an iterative solution and gave a numerical illustration using the example given 
by Kastenbaum and Lamphiear [1959]. 

It is of interest to note that Darroch suggested the likelihood ratio test based on 



ZHCD=21ij k riijk In ( * * 'ff , ) 
KnuajkPkiYij/ 



n/JLajifikiyij / 

which is asymptotically distributed as x 2 with (r— 1) X (c— 1) X (d — 1) degrees of freedom. 
Birch [1963J accepted (1.2) as the definition of no second-order interaction in a 3-way table and 
discussed maximum likelihood estimation of expected frequencies for many-way tables under 
different hypotheses. He also proved the conjecture by Darroch that the expected frequencies in a 
three-way table are uniquely given by the marginal totals if the expected frequencies are known to 
be positive and to satisfy the hypothesis of no second-order interaction. Thus, given any set of 
positive integers mjk he showed that there is one and only one set of positive numbers n*j k that 
satisfies the equations 

nfj. = rijj. , n* k = m.k, and n? jk .= n.jk, 

and also the conditions given by the no second-order interaction hypothesis expressed by (1.2). 

1.4. Good's Formulation 

The formulations of the hypothesis of no second-order interaction summarized up to now are 
basically extensions of Bartlett's. Justifications for such formulations are given in a number of 
ways: (1) residually as the difference between the independence of one classification (/?) with the 
other two classifications {CD) and the two independence hypotheses (RxC) and (RxD); 
(2) by symmetry requirements; (3) by analogy with analysis of variance. Lancaster [1951] also pro- 
posed a formulation based strictly on the analogy of the partition of x 1 to the analysis of variance. 
The shortcomings of his method were discussed by Lewis [1962] and Plackett [1961] and will not 
be repeated here. 

Good [1963] proposed to use the principle of maximum entropy as a heuristic principle for the 
generation of null hypotheses, with main application to m-dimensional contingency tables. Three 
versions of this principle are given in his paper. We quote here his Principle of Minimal Discrimi- 
nability: "Let X be a random variable whose distribution is subject to some set of restraints. Sup- 
pose that, before the restraints were known, there was some distribution that seemed reasonable 
to entertain as a null hypothesis, called an initially ausgezeichnet hypothesis. This hypothesis is 
perhaps refuted by the constraints. Then, in view of the restraints, entertain the null hypothesis 
that, if true, can be discriminated from the ausgezeichnet hypothesis at the minimum rate, i.e., 
for which the expected weight of evidence per observation is least." 

Numerous examples and theorems are given in Good's paper. By using his principle, it is shown 
that for an #i-dimensional2 X 2 X . . . X 2 contingency table (pi)=(pi 1 i 2 • • • im),ii 9 i2 9 . . .,im == 0, 
1, and with all the marginal probabilities down to (m— l)-way assigned, the null hypothesis to be 
tested is 

| i'| even 



even | /| odd 

n p»- n p> 



where \i\ = i\-\- i-> + . . . -hi m . The expression reduces to (1.1) when m— 3. 

Good also generalized the definition to that of no rth-order and all higher-order interactions in 
an m-dimensional contingency table with a complete set of rth-order restraints by means of discrete 
Fourier transforms of the logarithms of probabilities. However, the interactions so defined are 
usually complex valued unless the categories within each classification are equal to two. Goodman 
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[1964| followed the definition by Good but proposed a test that yields real valued interactions. 
Goodman's proposed test is based on W aid's criterion [1943] and unrestricted maximum likelihood 
estimates, and is essentially an extension of the tests proposed by Plackett [1962| and Woolf[1955]. 
While Good's and Goodman's formulations and tests of no-interaction hypotheses are entirely 
general, physical interpretations of their meanings became extremely difficult, if not impossible, 
for interactions higher than the second, in which case these interactions reduce to the ones dis- 
cussed before. Bhapkar and Koch [1965, 1966, 1968] outlined the models, tests, and interpretations 
of the hypothesis of no interaction in three-dimensional and four-dimensional contingency tables 
in great detail, and compared results of using different statistics, all based on W aid's criterion, for 
several numerical examples. 



1.5. Conditions Essential to a Definition of Hypothesis of No Interaction in 
Multidimensional Contingency Tables 

Based on the above review of treatments of higher-order interactions, it appears that several 
basic and related concepts are important in the formulation of its definition: 

a. The Fixed Marginal Totals 

In fact, to talk about the rth-order interaction in an ra-way table, the r-way marginals must be 
considered fixed; for otherwise we would be considering a less restricted hypothesis which includes 
the no-interaction hypothesis as a subhypothesis. This concept was implied in Bartlett's and Roy's 
definition, and explicitly stated in Good's definition. Darroch and Birch also assumed fixed mar- 
ginals. Goodman and Bhapkar, on the other hand, did not make such a demand on their definitions, 
and hence their formulations are less desirable in the sense that their interaction does not measure 
the interaction as given by the data, but that possibly for another set of data with somewhat dif- 
ferent marginal totals. 

b. The Requirements of Symmetry 

This requirement is a "logically attractive" condition as stated by Simpson, and demands that 
the statistic be invariant upon relabeling of the classifications. This requirement is again satisfied 
by all investigations with the exception of Goodman, and Bhapkar and Koch, who maintained that 
the symmetry requirement is not necessarily desirable for certain physical interpretations. 

c. Unique Set of Cell Probabilities (p, >0) 
The no-interaction hypothesis, which is presumably the last hypothesis to be tested in a hier- 
archy of hypotheses, should determine the cell probabilities uniquely. This condition was conjec- 
tured by Darroch for his formulation of the no second-order interaction hypothesis and shown to 
be true by Birch and Good. A measure of deviations of the data from this set of cell probabilities 
would therefore be a measure of interaction. 

d. Additivity of the Statistics 

The "mechanism" used by Roy and Kastenbaum in their definition of no interaction demands 
that H\ {R is independent of C)C\H 2 {R is independent of D)HHi (no interaction) implies H {R is 
independent of CD). The same requirement was also discussed by Birch [1963]. In general, if a 
more restrictive hypothesis can be considered as the intersection of several less restrictive hy- 
potheses, it would be logical and desirable to require the test statistics for the comoonent hypotheses 
to sum up to that of the more restrictive hypothesis. This requirement is not fulfilled by the usual 
X 2 statistic except in an asymptotic sense. The additive analysis of component variation, similar 
to that of analysis of variance, is a desirable feature of information analysis as noted by Lewis 
[1962|. 
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2. Interaction from the Viewpoint of Information Theory 

2.1. The Minimum Discrimination Information Statistic (m.d.i.s.) 

In the analysis of contingency table data, two major types of hypotheses are usually postulated 
and tested. One is a test of the cell frequencies of an observed sample table against known or given 
probabilities of a table of the same size, the other is a test of the structural relationships that seem 
reasonable among the classifications, e.g., association or independence among responses, homo- 
geneity of response over several categories of a factor, etc. 

In Kullback [1959], and two papers by Kullback, Kupperman, and Ku [1962a, 1962b], a number 
of useful tests for contingency tables based on the notion of information theory are given for hy- 
potheses which can be expressed explicitly as functions of specified marginal probabilities. The 
minimum discrimination information statistic, m.d.i.s., was suggested as the test statistic which, in 
its simplest form for a two-way contingency table, can be expressed as 

(2.1) 2nI(p:ir) = 2%i S x ij In -^-, 

where 7Ty is the probability of an observation from the ith row and 7th column of the table under 
the null hypothesis, S/j7T/j= 1, x/jis the observed frequency of occurrence in the corresponding cell, 
%ijXij= n, and In is the natural logarithm. In is defined as zero. 

Similarly, the m.d.i.s. for the test of independence between the row and column classifications, 
i.e., TTij=TTi.7T.j, is shown to be 

(2.2) 2nl(p:7r) = 2l ij x ij \n 



X i . X . j 



where x\. = 2j#y, x.j^XiXy, are the row and column marginal frequencies of the two-way table. 
We may consider either of the two tests given by (2.1) and (2.2) to be a comparison of the ob- 
served frequencies against a set of frequencies in a constructed table represented by xfj=npf., 
where {p*j} is the set of cell probabilities that "most" resembles {mj} subject to certain marginal 
probability restrictions. In fact, the set of {pfj} can be obtained by minimizing the discrimination 
information 

(2 ' 3) 2/(p:7r)=22* iPii ln^ 



subject to these restrictions. For the case considered under (2.1), the minimum value is zero for 
p*j = 7Tij, the restriction 2,-jAipy = n is always fulfilled. For the case (2.2), the minimum value of (2.3) 
is attained for p*j = Pi.p.j, with the restriction npt. = xu, and np.^ — x.^ as shown in Kullback et al. 
[1962b]. 

It would clearly be desirable if this concept could be extended to the formulation of second- 
order and higher-order interactions. Here, however, we encounter essentially the same difficulties 
as discussed by Darroch, i.e., these interaction hypotheses cannot be formulated in terms of ex- 
plicit functions of the marginal probabilities such that these functions also satisfy all the fixed 
marginal total restraints. To resolve these difficulties, we need to give a number of new results 
with information theoretical background due to Ireland and Kullback [1968] in conjunction with 
their study of an estimation problem first considered by Deming and Stephan [1940]. We shall 
summarize these results in the next section and show how they may be applied to give a unified 
approach to the analysis of interactions in multidimensional contingency tables. 
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2.2. Summary of Current Results 

In the following we shall present a number of current results in the form of three theorems. 
Proofs for theorems 2.1 and 2.2 using properties of the minimum discrimination information are 
given in Ireland and Kullback [1968], and will not be repeated here. The use and interpretations of 
these theorems in our case for the purpose of hypothesis testing, however, are quite different from 
that of their paper which essentially treats a problem in estimation. These differences will be 
discussed and appropriate modifications to the statement of these theorems incorporated. 
THEOREM 2.1. Given a contingency table {n^}, i=l, 2, . . ., r, j = l, . . ., c, 77^ > 0, 2^77^=1. 
Consider all contingency tables {py} of the same dimension such that the marginal probabilities 
Pi. = SjPij and p.j = EjPij are given and fixed. Then the minimum value of the discrimination infor- 
mation 

(2.4) I(p:7r) = S ijPij /rc ^ 

is attained for Pij = pjj = a^j^ where the a^s and b/s are determined subject to the marginal 
probability restrictions. 

Deming and Stephan [1940] considered the problem of estimation of cell probabilities from a 
sample of observations in an rXc table for which the population marginal probabilities/;/, and p.j 
are known and fixed. Hence if we use the maximum likelihood estimates of the cell probabilities, 
7Tij = nijln, which do not necessarily satisfy these marginal restraints, then the question can be 
posed "What is the ptj distribution satisfying these restraints and also 'closest' to the observed 
sample in some sense?" Deming and Stephan [1940] suggested using estimates that minimized 

(2.5) (X') 2 = £y(ftij - npij) 2 lnij. 

Ireland and Kullback [1968] suggested minimizing the discrimination information (2.4) which 
generates RBAN estimators, as does Deming and Stephan's procedure. 

We note that 7r is not specified in the theorem. If, instead of letting TT U = nij/n, we use irr, to 
represent the cell probabilities of some reasonable hypothesis which we are interested in, or the 
ausgezeichnet hypothesis in the sense of Cxood, then the pg distribution will represent the distri- 
bution that is "closest" to this hypothesized distribution subject to the marginal restraints in the 
sense of minimum discrimination information or "minimal discriminabihty". For instance, if 
7T U =7Ti.iT.j, or the hypothesis of independence of row and column classifications, then by theorem 
2.1, 

p? } = aibjTTij, 

Pi. = Ijpfj = ailjbjTTij, 

p.j= 2iP(S= bjliamj, 



= Pi-P-j 



The hypothesis to be tested is then the independence of the two classifications subject to the 
restraints np,.=Xi. and np.j = x.j, the fixed marginal totals. 

Theorem 2.1 is stated in terms of a two-way contingency table for notational convenience. For 
a three-way contingency table, if TT& ■ = ir i ..7T. j .ir.. k , then p&.= aibjC k 7Tijk and the hypothesis to be 
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and 




,. * Pi- P-i 


Pt-P-i 


ylM > FiJ iruZjbfiT.j ir.jliam. 


" ,J (?jbjn.j)(2iam.) 


since S,ja,-6j7T,j = 1. 





tested is p ijk = pj..p.,.p.. k , following the same derivation as the two-way table exactly. We shall 
define all such hypotheses where the one-way marginals are completely specified as the no first- 
order interaction hypothesis. 

Extending this concept, we shall define no second-order interaction in a three-way contingency 
table as represented by the p? jk distribution when all the two-way marginals are considered as fixed 
and for appropriate selection of 7r ijk . The justification for this formulation will be discussed in 
subsection 3.2. Theorem 2.1 can be then restated as 

Theorem 2.1 A. Given a contingency table {77 ijk }, i= 1, 2, . . ., r, j= 1,2 c,k=l,2 

d, Sijk^ijk" 1 ? where the 7r ijk > represent cell probabilities of some reasonable hypothesis. Consider 
all contingency tables {py k } of the same dimension with fixed two-way marginals py., p. jk , and 
Pi. k , then the minimum value of the quantity 

I(p: 7r) = 2 ijk p ijk In — 
^ijk 

is attained for p ijk = p?; k = aijb jk c ik 7r ijk , ^m^b^c^ir^ = 1 , where a u , b jk , and c ik are functions of 
the given two-way marginal probabilities. Equivalently, the condition may be stated as 

(2.7) In '5l*.= b a u + In b jk + In c ik , 

representing no second-order interaction among the three classifications. 

To compute the numerical values of py A , we need 
THEOREM 2.2. The set o/p,* in Theorem 2.1 can be computed by an iterative procedure alternatively 
satisfying one and then the other marginal restraints. The iteration is given by 

p (2n-l) = Pi- p (2n-2) p (2n) = Ej p (2n-l) j 2 D (()) =7T- 

* J ij p(2n-2)t*ij ' p ij p (2n-l) Hij . " i.z, . . ., p.. 7T iy 

If p^ represent the value of the cell probabilities after the Nth iteration, then either p ( . N) = f i j = p.* 
for some finite N, or p ( ij x) = p*. 

We shall indicate the first few steps of the iteration process for a two-way table using the 
relationship 



Let M 1} =1, then 



pfj = aibjTTij. p h = (liljbjTTjj, p.j= bjXiaiTTij 



Pi. = <#%!., p^ = ^7rij = ^b^n ih 



n .= />(^)y.// (1) 7T- n (2)=P±L />(!)— x / (D/,(2) 77 ... 

p.j 0. ± { a. TTy, \).. {l) Pij a i Oj 77-,,, 

n . =/i(2)5 -H^tt- M) = Mt- f) (2) ==<7 (2)/ ) (2) 7r .. 

p,. a) Z^jOj 7T,j, Pjj Pjj (I i Uj 77y, 

Ph 

p. j Oj Z, } a- ( 7T y , p t j (:l) Pjj (I j Oj 7T,j, 
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p.j = bf+^iafTTij = p< 2 /\ pgn)= -_^i- p (2n-l) = aWfcjn+l)^ 

etc. 

Of course if 7T/j = 7T/.7T.j, then after one cycle of iteration on the two marginals, we have 

P.j v Pi- 

Z/ 7T/.7T.J 

77"/. 



= P/-P-j 

and the marginal restraints are satisfied exactly. Hence the iteration process terminates and the 
solution is exact. 

The results of iteration with different sets of specified marginals for a four-way contingency 
table TTijki, i = l, • • • » r, j= 1, . . . , c, k= 1, . . . , of, and /= 1, . . . , f, are given below for 
purpose of illustration. 

Let the given marginals be 

(2.8) pi..., p.j.., p..*., p.../ 
then 

(2.9) Pt*jw = aibjCkdiTTijki 

Pi... = aiS,jkibjCkdi7Tijki 
p.j.. = bjLikmckdmijki 

p.. k . = CkSiijiaibjdi7Tijki 
p...i= dilijkaibjCk7Tijki 
The iterative solution of the system (2.9) cycles through 

/ n 1A x n (4M+l) — P'"' n(4n) n (4«+2) — P' J " n (4;t+l) 

(2.10) P«w {4n) P Wl> P W D «n+l) P V kl ' 

n (4/t+.3)— P" A '- n (4n+2) n (4n4-4) = P'" 1 n (4n+3) 

P * n (4» + 2) / ''^ / ' ^U« D (4H+8) P «« ' 

P.. A. P.../ 

Let the given marginals be 

(2.11) Pi..., p.j... and p.. w . 
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Then the problem is essentially that of a three-way table pijm where /n= (A:, /) = 1, . . . , dt. Since 
given p. .iii determines the marginals p.. a-, and p.../, that is (2.11)=) (2.8), 

^tjklPijkl in _ ^ ^ijmPijm ln _ 

"ijkl **ijm 

Let the given marginals be 

(2.12) py.., p/.A- , Pi../, p.jA- , p.j./, p..*/, 
then 

(2.13) p* jkl = aijbi k cud jk ejif k i7Tij k i , 

Py • • — ai?£ k \bi k Ci\dj k ej\f k \TTij k i , 



P« -A/ —f k l^ijClijbi k Cndj k eji7Tij k i , 

and the iterative solution of the system (2.13) cycles through 

(2.14) (6;t + l) = ^-- n (6«) n (6M + 6) = £l±J n (6n+5) 

Pij-- P-kl 

Since the marginals given in (2.12) determine those given in (2.8) and (2.11), i.e., 

(2.12) =) (2.11) =) (2.8), it follows that 

(2.15) I(p* : ir|2.8) ^ /(p* : ir|2.11) ^ /(p* : tt|2.12). 
If the given marginals are 

(2.16) pijk. , py./, pf.fr/andp.jA/, 
then 

(2. 17) p^., = aijkbijidktdjkimjki , 

P/jA-- — dijk^lbijiCikldjklTTijkl , 



p.jA-/ — djkiZiaijkbijiCiki7Tijki , 
and the iterative solution of the system (2.17) cycles through 



n(*»+D = *fy n(4,nJ n ( 4 "+ 4 > = - J „(4«+3) 

/y 0'A-/ nUtOPijkl* • • •■» Pijfr/ t A4n + 3)Pijkl ' 

^ijk- P-jkl 



Since (2.16) =) (2.12) => (2.11) =) (2.8), it follows that 

(2.18) /(p* : tt|2.8) ^ /(p* : tt|2.11) ^ /(p* : tt|2.12) ^ /(p* : tt|2.16). 
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These relationships will be useful in the eonstruction of analysis of information tables to be 
described in subsections 3.1 and 3.2. The appropriate choice of ir^ki will also be deferred for 
later discussion. 

A third result we shall need is 
Theorem 2.3 The equality 

(2.19) I(p:7r)=I(p:p*) + I(p*:7r) 

holds true for p* computed by the procedure stated in Theorem 2.2 where the p*- and ^-distribution 
have common specified marginals. 

This is a special case of a property of the minimum discrimination information and can be 
deduced from a theorem in Kullback and Khairat [1966]. We demonstrate the theorem as applied 
to a three-way contingency table when all two-way marginals are considered fixed, and where 

2kP* jk = 2>kPijk = Pij-, 
2jP* jk = 2jPij k = pu k , 

liPijk = ^iPijk = p.jk- 
We have 

(2.20) S,,,.,;*, In f| 

= ^UkPfj k In aij + 2,ij k pf Jk In b jk + Z ijk p?j k In c ik 
= L ij Pi j . In aij+^Ljkp.jk In bjic + liicpuic In c ik 



v i Pijk 

= 2Wy*ln— • 



Hence, 



I(p : 7r) = ZijkPijk In -** 

TTijk 

v l P*J k i v i PiJk 

= Zi jk pijk In —r + ZijkPijk In — - 

Pijk 7T ^ k 

= /(p:p*)+/(p*:ir). 

2.3. The No-Interaction Hypothesis as a Form of "Generalized Independence" 

The formulation of no second-order interaction given in theorem 2.1 A suggests that all no inter- 
action hypotheses can be defined in a similar manner, depending on the marginals which are 
considered given and fixed. Since higher-order marginals determine all lower-order marginals, 
it is natural to consider the 77-distribution as the uniform distribution, corresponding to the case 
where no marginals are specified, as a general form of independence. For the uniform distribution, 

pfjki = 7T ijki = — j in a four-way contingency table. Given all one-way marginals and taking 7Tij k i 
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I 

rcdt 
(2.21) 



, we obtain 



P$ k l = Pl...p.j..p..k.p...ln 



i.e., the distribution which "most resembles" the uniform distribution subject to the four one-way 
marginal restraints. Derivation of (2.21) follows exactly as that of (2.6). Given all two-way mar- 
ginals, the distribution that "most resembles" the uniform distribution subject to the six two-way 
marginals restraints is the p* distribution representing no second-order interaction. Given all 
four three-way marginals, we obtain p* corresponding to no third-order interaction. 

In this sense the no-interaction hypotheses can be considered as generalized forms of in- 
dependence hypotheses, where the "degrees" of independence that can be realized depend on 
the marginal restraints imposed. Each time we add on a restraint, we obtain ap* distribution cor- 
responding to the condition of minimum discrimination information subject to the additional 
restraints, and corresponding to the appropriate null hypothesis to be tested given this additional 
restraint. 

Hence we may state the principle of minimum discrimination information for the generation 
of appropriate hypotheses: 

"If certain marginal probabilities of a contingency table are considered given or fixed, then 
the appropriate interaction hypothesis to be tested, subject to these fixed marginal restraints, 
is the hypothesis represented by the unique set of cell probabilities p*j kl satisfying these restraints 
and yielding the minimum value of discrimination information 



(2.22) 



I(p : n) = HijktPijki In ptjki + In rcdt 



for all puki" 

It can be shown that all the usual "classical" hypotheses can be generated by the application 
of this principle. If complete sets of marginals are considered given in a four-way table, we arrive 
at the following sequence. 



Marginals considered as fixed 


No-interaction hypothesis 


p.... 


= 1 




1 

P "~rcdt 


zeroth-order 
(uniform). 


Pi-.. 


p.j.. 


p..*., P-'i 


Pt 


first-order 

(4-way independence). 


PU-- 


Pt-k- 


Pi.. i, p.jk. , p.j. i* p.-ki 


pi 


second-order. 


PUk. 


Pij-li 


Phkln P'jki 


?t 


third-order. 


Pijkl 






pt = pijki 


fourth-order 
(no test). 



If only part of a complete set of marginals are given, a conditional type of independence is 
generated. Some of these hypotheses for which the pf jkl can be explicitly expressed in terms of 
the marginals are given in table 2.1. We demonstrate here the generation of the conditional inde- 
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pendence ill a three-way table when two of the two-way marginals, py. and p. jk are assumed to 
be fixed. 

TABLE 2.1. Some explicit expressions for p* 



Marginal restraints 


PSfci 


p.... = l 


l/rcdt 


Pi... 


Pi. ..jcdt 


Pi... , p.j.. 


Pi...p.j..jdt 


Pi... , p.j.. , p.. k . 


pi. ..p.j. .p. . k . It 


Pi... , p.j.. , p.. k . , p... ( 


Pi...p.j..p.. k .p...i 


Pij- - 


PU- J (It 


Pij.. , p..ki 


Pij..p..ki 


p.j.. , pi..;, p.. w 


p.j..pi..ip.. k ilp...i 


Py.. , P/.fc. , Pi.. 1 


Pij..pi. k .Pi..il(pi...) 2 


Pij.. , P,'../, P.. W 


p,j..pi..ip.. kl lpi...p...i 


Pij.. , p,-..,, p.jfc. 


no explieit expression 



By theorem 2.1A we have 



subject to the restraints 



Pijk— a ijbjk1Tijk, 



2iP*jk = bjkZidijTTijk = P.jk* 

^tcpfjk = aijEkbjkTTijk = Pij. • 



Hence 



(2.23) 



Pijk 



P-jk 



Pu- 



{ILmjTTijk) (^kbjkTTijk] 



' TTijk • 



If we let TTij k = :, then 

red 



(2.24) 



Pijk : 



. p.jkPij. = p.jkP,j. 

^ikdijbjkTTijk p.j- 



or the conditional independence hypothesis of the row and depth classification given the column 
classification. 
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It is of interest to note that the same expression for p*j k in (2.24) is obtained whether we use 
(2.25) 7r ijk =llrcd 

TTijk = TTi • .TT.j. 7T . . A- , Or 
7Tjj k =7Tij.7T. jk lir.j.' 

Thus, as long as 7r,jA' represent the condition of generalized independence corresponding to mar- 
ginal restraints of an order lower or equal to the given marginal restraints, the same set of p* k will 
be generated by our procedure. In the present case 

given (p,>, p.jk) =) (Ph., P.J., P-. a) => p... ■ 



A simple proof of this property is as follows. The set of p* k is obtained by minimizing the 
discrimination information 

I(p : 7r) = ^ijkPijk In - 1 — 

Kijk 

subject to given marginal restraints. Now 

(2.26) I(p:w)= XtjkPijk In pijk — %jkPm In 7T ijk 

and the second term of the expression in the right-hand side of (2.26) reduces to a constant no 
matter which form of 77/,* in (2.25) is used. Hence the same p* k will give the minimum value of 
(2.26). Good [1966] showed that the chains of hypotheses generated by the principle of minimum 
discriminability depend only on the increasing sequence of linear constraints, irrespective of 
which of the existing hypotheses the new ones are referred to. 



2.4. Consistency of Information-Theoretic Definition of No Second-Order Interaction 

With Other Formulations 

We shall show that the definition of no second-order interaction given in Theorem 2.1 A is 
consistent with the formulations given by Bartlett, and Roy and Kastenbaum represented by 
(1.1) and (1.2). We may remark here also that the p*-distribution'satisfies all the four requirements 
in subsection 1.4. The p*-distribution satisfies the requirements of (a) fixed marginal totals (b) 
symmetry and (c) unique set of cell probabilities because of the way it is derived. Additivity is a 
property of m.d.i.s. [Theorem 2.1, ch. 2, Kullback 1959] which facilitates the construction of analysis 
of information tables. 

For an r X c table with given set of cell probabilities pij and 77// , let us find the table with the 
same marginals as pij but minimizing the expression 

ITjj 

with 

2idij = 2jdij = Q. 
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Since f(dij) is a convex function, its minimum is given by the difs satisfying the set of equations 

df(dij) = Q = l n PU + d V ln Pic + d ic ^ Prj + drj | ^ Prc+ d rC 
ddij 77* jj 77 ic 7T r j TT re 

For the values of difs satisfying the above let pij + d\j = p,*, then the set of equations reduces to 

Pii Pr*c Pfc P?j 

(2.27) Z1L. £* = **. Z1L % 

IT ij 7T rc TTic TTrj 

and all are satisfied by 

Py = aibj7nj. 

This procedure is essentially what Bartlett used in getting a solution to the no second-order 
interaction hypothesis in a 2X2X2 table. Bartlett specified the condition of no second-order 
interaction to be 

Pll2Pl21 P212P221 

For observed cell frequencies Xfjk, he solved for A in the equation 

Um+A)Ui 22 H-A) = U 21 i-A)(3c 2 22-A) 
(2 ' 29) (*iu-A)(*«i-A) (*2i2+A)(* 221 + A)' 

and computed 

In fact, Bartlett's A" 2 is an approximation to 2nl given in (3.3). If we let Xijk ■ = x?j k ± A and expand 
x\jk In Xijh- about xfj k In xfj k by a Taylor series expansion up to A 2 , we have 



x ijk In x ijk = x% k In xf jk ± A ( In xf jk + 1 ) + — \-A • 



Summing over i, jf, k= 1, 2, 



S«*xyjt lnxyA- - S/ja-^a- ln4 A .-f 2/ja(± A) (ln** A .+ 1) +— 2 *- 



A 2 ^ 1_ 
* 



Since the middle term on the right hand side is equal to zero by (2.28), 

2nl= 2t m x m In *f = A'S 0A . — = XK 

x ijk x ijk 

Also it can be checked readily by substitution that if a 77/ja distribution satisfies (2.28), then the 
p*j k distribution also satisfies (2.28) when pjy|fr= dijbjkdk^'ijk' The same is true for the set of con- 
ditions of no second-order interaction in an r X c X d table given by Roy and Kastenbaum in eq (1.2). 
It is to be noted also that Good's Principle of Minimal Discriminability is essentially the same 
as our Principle of Minimum Discrimination Information. Darroch's suggested procedure and our 
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procedure also share a number of similarities. However, we have demonstrated that many hypoth- 
eses of interest for contingency tables can be generated through one unified procedure based 
on information theory and the hypothesis of no second-order interaction is no exception. Further- 
more, the properties of the minimum discrimination information statistics, convexity and additivity, 
allow us to derive and interpret these statistics naturally, and to construct the analysis of informa- 
tion tables for an overall comparison between the sample and the nested sequence of hypotheses. 
For other related literature dealing with a similar problem but not in the testing of no-inter- 
action hypotheses in contingency tables, we cite P. M. Lewis II [1959] and D. T. Brown [1959]. 
Lewis, on an ad hoc basis, considered approximations to a discrete probability distribution over 
the space of binary random variables by distributions which are products of various marginal 
distributions of the original one and measured the goodness of fit by the discrimination information 
measure. D. T. Brown extended this notion to using an approximating distribution having certain 
marginals the same as those of the original distribution. He described an iterative procedure 
which is the same as the example following theorem 2.2, and showed that the goodness of the 
approximation improves at each step of the iteration using discrimination information as a measure 
of goodness. The procedure he described is the special case of an rc-way 2X2X . . . X2 table 
with initial cell probabilities all equal to 1/2". 

3. Analysis of Multidimensional Contingency Tables 
and the Interpretation of Results 

3.1. The m.d.i.s. for the No-Interaction Hypothesis 

In the last section we have shown that the no-interaction hypothesis can be considered as a 
hypothesis of generalized independence, subject to fixed marginal restraints. Furthermore, the 
unique set of p*-distributions can be computed by a convergent iterative process alternatively 
satisfying the given marginals. 

Let p<) represent the cell probabilities under the hypothesis of uniform distribution. We say 
that {p*} is the table that most resembles {p () } subject to these marginal restraints, or there is 
no interaction between {p*} and {p () } in the sense that 

/(p* :p„) = 2p* In ^ 

is a minimum for all p's consistent with dimension of the table and the given restraints. For any 
other table we may write 

(3.1) Sp In -£-= 2p* In — + f2p In ^-2p* In ^1- 

Po Po I Po PoJ 

Since the first term on the right represents the condition of no interaction, the term in the bracket 

(3.2) 2p ln-£--£p*ln^- = Sp In ^ 

Po po p" 

is then a measure of interaction, or the departure of the p-distribution from the no-interaction 
distribution. The equality in (3.2) follows from theorem 2.3. 

Given an observed sample with cell frequencies jc,ja in a three-way table, and let*,;. ,*/./,. and 
x .jk be the given marginal restraints, then the m.d.i.s. for testing the hypothesis of no second-order 
interaction is 



(3.3) 2rc/(p :p*) =2%i jk Xi Jk In 
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xw 



O'A- 



where xf jk = npf jk , and p = Xij/Jn. 

The distribution oi2nI(p:p*) is asymptotically chi-squared as shown in Kullback [1959], Kupper- 
man [1957] with (r— l)(c— 1) (d— 1) degrees of freedom. The same result is obtained if we note 
that pf jk 's are RBAN estimators in the sense of Neyman [1949] of the cell probabilities under the 
hypothesis of no second-order interaction, or CAN estimators of Rao [1965, p. 288]. The degrees 
of freedom are calculated from the general principle of equivalence of degrees of freedom and the 
number of independent restraints imposed by the specified marginals. In this case the degrees 
of freedom are 

(3.4) rrd-l-(r-l)(c-l)-(r-l)(d-l)-(c-l)(d-l)-(r-l)-(c-l)-(d-l) 

= (r-l)(c-l)(d-l). 

For a four-way table, the relationship between various interactions corresponding to com- 
pletely specified sets of marginals are given in table 3.1. The following notations are used in the 
computation of degrees of freedom: 

(3.5) /V = rcdt - 1 = Ni + N 2 + N-a + N 4 
A^=(r-l) + (c-l) + (d-l) + (*-l) 

Ni=(r-l){c-l) + (r-l)(d-l) + (r-l)(t-l) + (c-l)(d-l) + (c-l)(t-l) 

+ (d-l)(t-l) 

N-,= (r-\)(c-l)(d-\) + (r-])(c-\)(t-l) + (r-l)(d-l)(t-l) 

+ (c-l)(d-l)(t-l) 
N 4 =(r-l)(c-\)(d-l)(t-l). 

Table 3.1. Analysis of information— four-way table 



Information 


For testing the hypothesis 


of 


Degrees of freedom 


2nl(p :p$) 

2nl(p$:pf) 


No third-order interaction 


N-Ni-N t -N a =N 4 

N, 


2nl(p :p$) 

2nl(pf:f>f) 


No second-order interaction 


N-Ni-Nt 
N, 


2nl(p :pf) 
2ra/(pi*:p ) 


Independence (No first-order interaction) 


N-N, 


2nl(p :p») 


Uniformity 


N 



In table 3.1 we have specified a complete set of marginals as fixed for each hypothesis. This 
friction is clearly unnecessary. We shall define the /^-distribution generated by a partial set 
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of marginals or a mixed set of marginals of different order as the no "mixed-order interaction" 
hypothesis. An example of mixed-order interaction is the conditional independence hypothesis 
given by (2.24), where only two of the two-way marginals are specified in a three-way table. 

To simplify the notation for a step by step analysis of a four-way table, let us denote the 
quantity 

-%UklP*j k l ^ Pijki 

by H{ ) where the indices within the bracket stand for the marginals that are considered fixed. 
If all the two-way marginals are considered fixed, we may write H (given two-way), or other de- 
scriptive phrases with defined meanings. The symbol /( ) will be used to denote the difference 
between H{ ) and H (data). 

We note that the 4 7/" notation used here is the notation for entropy. For the case 7TijM = l/rcdt, 
the problem of minimizing I(p :tt) in (2.4) subject to certain restraints is equivalent to minimizing 

(3.6) 2/jfc, Pijki In pijki + In rcdt, 
or maximizing the entropy 

(3.7) — XijkiPijki In pijki 

subject to the same restraints. The latter problem has been considered by Good [1963, 1965, 1966]. 
Since higher-order marginals determine all lower marginals, we have, corresponding to (2.18), 
the following: 

(3.8) H (data) ^ H (given 3-way) = // 3 

^ H (given 2-way) = // 2 
^ H (given l-way) = //i 
^ H (uniform) = Ho = In rcdt. 
Hence, 

(3.9) / (given l-way) = //i —H (data) 

= [H>-H (data)] + (#,-//,) 
^ / (given 2-way) 

(3.10) / (given 2-way) = H>-H (data) 

= [H,-H (data)] + (//•>- H,) 
^ / (given 3-way) ^ 0. 

In table 3.2 the two-way marginals are added one by one to the four one-way marginals until 
all six are specified; then the three-way marginals are added one by one to the complete set of 
two-way marginals until all four are specified. The components of information are expressed in 
terms of the differences in entropies, and when possible, also in terms of the form that can be 
obtained through the convexity property of m.d.i.s. 
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TABLE 3.2. Interactions in a four-way table 



Marginal restraints 
added 



Pw 



Information 



Form obtained by 
convexity property 



d.f. 



P/...,p.j..,p.. A .,p.../ 



Pi...p.j..p.. k .p..., 



H(i,j\k 9 l)-H(datei) 



2tijklPijkl In 



Pijkl 



Pi...p.j..p.. k .p...l 



rcdt-l-(r-l)-(c-l) 
-(d-l)-(t-l)=N-N i 



P.. A/ 



Pi...p.j..p.. kl 



H(iJ,kJ)-H(iJ,kl) 
= H{k,l)-H{kl) 

H(iJ, kl) -//(data) 



2 a- /p.. ki In 
ZijklPijkl In 



P.-A-P.../ 

Pijkl 
Pi...p.j..p..kl 



(d-D(t-i) 



N-Ni-(d-l)(t-l) 



^ Pi- -I 






P..;- 



pi..ip..ki 
p.../ 



H{i,j,kl)-H(j,kl,il) 
= H(iJ)-H(il) 

//(/,£/,;/) -//(data) 



v „ , pi.. i 

Zupi.,1 In 

P/...p.../ 



ZijkiPijhi In 



gji 



p.j..(p..kiPi..i/p...i) 



(r-l)(t-l) 

N-Ni-(d-l)(t-l) 

-(r-l)(t-l) 



Pij. 



Pij..p..kipi..i 
P/...p.../ 



H(j\kl,il)-H(kl,il,ij) 
= H(iJ)-H(ij) 

H(kl,il,ij)-H(data) 



lijPij.. In; 



Pu- 



ZykiPuki In 



p,... p.;,.. 

Pijkl 



Pij..p..klPi..l 
Pi. ..p. ..I 



(r-D(c-l) 

iV-iV, -(<*-!)(*- 1) 
-(r-l)(*-l) 

-(r-D(c-l) 



P.JA- 



H(kl,il,ij)-H(kl,il 9 ijjk) 



H(kL U, ijjk) —//(data) 



Zjkp.jk. In 

v P/j..p..A/P/../ 
P/.-.p.../ 



(c-D(rf-i) 

/V— TV, — (rf— 1)(^— 1) 
-(r-l)U-l) 
-(r-l)(c-l) 

-(c-D(rf-i) 





Table 3.2. Interactions in a four-way 


' table — Continued 




Marginal restraints 
added 


Ptjki 


Information 


Form obtained by 
convexity property 


d.f. 


P/.A-. 




H(kl,il,ij,jk)-H(kl,il, 

ijjk, ik) 
//(A-/, il, ijjk, ik) -//(data) 


Iteration on indicated two- 
way marginals 


(r-D(d-l) 

tf-jVi-(d-i)a-i) 

-(r-l)(t-l) 
-(r-D(c-l) 
-(c-l)U-l) 
-(r-l)(^-l) 


P-j-i 


Second-order inter- 
action 


H(kl, il, ij,jk, ik) —//(given 

two-way) 
//(&/, il, ijjk, ik,jl)—H(da.ta) 




(c-l)U-l) 

7v-yv,-A^ 2 


§ Six two-way marginals 




//(given two-way) — //(data) 




N-Ni-N* 


Pijk- 




//(given two-way) — H(il,jl, 

kl, ijk) 
H(iljl, kl, ijk) - H(dsLta) 




(r-l)(c-l)(d-l) 
difference 


Pij-i 




H(il,jl,kl,ijk)-H(ijk,ijl) 

H{ijk, ijl) —//(data) 




(r-l)(c-l)(t-l) 

difference 


Pi. hi 




H ( ijk , ijl,) — H ( ijk , i/7 , ikl ) 
H(ijk, ijl, ikl) —//(data) 




(r-l)(d-l)(«-l) 

difference 


P-jkt 


Third-order interaction 


H(ijk, ijl, ikl) —//(given 

three-way) 
//(given three-way) — //(data) 




(c-l)(rf-l)(t-l) 

N-Ni-N- 2 -N 3 

= (r- l)(c-l)(d-l)(t-l) 



We note in table 3.2 that the addition of each two-way marginal restraint generates a hypothesis 
of two-way independence of the corresponding marginal table, or the conditional two-way inde- 
pendence given one or more marginals. Beginning with the fourth marginal restraint, however, 
these hypotheses can no longer be expressed in an explicit closed form and the respective p*- 
distributions will have to be generated by the iteration procedure. 

3.2. Some Typical Second-Order No-Interaction Hypotheses 

In many practical applications, the hypothesis of interest is usually suggested by the physical 
relationship involved in the problem, and the result of the test admits natural interpretations. 
The no second-order interaction in a three-way table originated in this manner, i.e., it is a test 
of the sameness of the measure of association between R and C classifications over categories 
of D [Simpson 1951 1. 

With the addition of another dimension in a four-way table, there are a number of mixed 
second-order interactions to which there are no corresponding ones in a three-way table. Some 
of the typical ones are described below. 

The interaction of a one-way by two-way interaction over the fourth classification is a mixed 
second-order interaction. There are six such second-order interactions, corresponding to the six 
three-way tables with different marginal probabilities, and possibly different dimensions, that 
can be constructed from a four-way table. Since symmetry is a property of our procedure, these 
interactions could also be considered as the interaction of one-way by one-way interaction over 
categories of the two remaining classifications. For example, the second-order interaction 
(DTX R) (C) is the same as the second-order interaction (RX C) {DT). 

The analysis of information table for these second-order interactions is given in table 3.3, 
using the convexity property of the m.d.i.s. to indicate: 

(1) how this second-order interaction can be derived, and 

(2) the particular marginals which must be specified for the iteration procedure, viz.,p//.., 
p.jki, andp,.A-/ in this case. 

We note that if we consider 



7) (<>) 

Pijkl 



.PU--P-JM 



then 



(1) =^i D (0) 
ijkl n (o) Pljkl 






PU~P-Jkl 


Pi • kl 




P-J- 


v PV--P- 

J p. j.. 


fkl 



which is exactly the denominator appearing in the expression for second-order interaction. Hence 
the convexity property of the m.d.i.s. is useful in giving an explicit expression for the pfj kl value 
after the first iteration. This agreement is not surprising since we are utilizing two distinct but con- 
sistent properties of the m.d.i.s. 

Viewing the four-way table in another perspective, there are four distinct second-order inter- 
actions defined as the interaction of the three-way interaction over the fourth classification. The 
analysis for (RxDxT) (C) is shown in table 3.4. The marginals to be considered as specified are 

p,/.., p.jk-, p. hi, and p;. i x -i. 
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TABLE 3.3. Mixed-order interactions in a four-way table 



Component due to 



Information 



d.f. 



(DTxR)(C) 

Second-order interaction a 



(DT, R)(C) 

Three-way marginals 
interaction with column 



2XijkiPijki In 



Pjjkl 



Pij.-P-Jhl Pi- hi 



2HiklPi.kl ln- 



p-j- 



Pi. hi 



2 Pij..p.jkl 

J p.j.. 



Sj 



PiJ.'P.Jkl 
P.J.. 



(c-l)(r-l)(dt-l) 



(r-D(dt-l) 



DTXR\C 
Two-way by one-way 
interaction, given column 



2%tjkiPijki In 



Pijhl 



Pij.'P.jkllp.j.. 



c(r-l)(dt-l) 



a Form shown is the second-order interaction after first iteration. 

Conceptually a third-order interaction in a four-way table may be defined as the interaction 
of the second-order interactions of the three classifications over categories of the fourth classi- 
fication. The analysis following this line of thought is given in table 3.5, showing the marginals to 
be specified are 



P/jA-., Pi. hi, Pij.i, and p, 



jhi. 



It is clear that we do not have a direct counterpart of third-order interaction in the classical 
hypotheses. The interpretation of no third-order interaction also becomes obscure in the conven- 
tional sense. We propose, therefore, to consider a hypothesis represented by the p*-distribution 
as that of a generalized independence (generalized no interaction, no association) among the clas- 
sifications with given fixed marginals, and give a unified interpretation in the following subsection. 



Table 3.4. Mixed-order interactions in a four-way table 



Component due to 



Information 



d.f. 



(RxDxT)(C) 
Second-order inter- 
action a 

(/?, D, T)(C) 
Three-way marginals 
interaction with column 



2Xi }k ip Uk i In 



Em 



Pij..p.jk.p.j.i Pi'hi 
(p.>.) 2 v Pij-p-jh.p.j.i 



Zi (p.j..r 



2XiklPi.kl 


1™ 


Pi. hi 


in 

2; 


PiJ.'P'Jh-.p.j.l 
(P,h.f 


2XijklPijkl 


ln- 
P 


Pijhl 
j..p.j k .p.j.l 



(c—l)(rdt 
-r-d-t+2) 



rdt—r—d 
-* + 2 



RxDxT\C 

Three-way interaction, 
given column 



(p-j-f 



c(rdt—r—d 
-* + 2) 



a Form shown is the second-order interaction after the first iteration. 
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Table 3.5. Third-order interaction derived from the convexity property 



Component 
due to 



Information 



d.f. 



{RXT)(D)(Q 
Third-order interaction 



(R,D,T)(C) 

Three-way marginals 
interaction with column 



2XijkiPijki In 



Pm 



Pi. hi 



PH'l Pijk.p.jkl 



(r-l)(c-l)U/-l)(/-l) 



v . Puk-P-jkl t Pi-kl y Pijk.p.jkl P-jk- 

P-jk" S Pijk-P.jkl p.jk- 

Zk — 



p.jk. 



2Xi k ipi.ki In 



Phkl 



S; 



Pijk.p.jkl Pij.t 

P:ik. 



P.jk. s Pijk.p.jkl 

Zk ~ 



(r-l)(d-l)(t-l) 



(R x T)(D)\C 

Second-order interaction, 
given column a 

(R,T)(D)\C 

Two-way marginals inter- 
action with depth, given 
column 



2XijkiPijki In 



Pjjkl 



Pijk.p.jkl Pij.i 

jk.p 
P.jk 



p.jk. s Pijk.p.jkl 

Zk ~ 



SZijiPij.t In — 



Pijk.p.jkl 
p.jk. 



c(r-l)(d-l)(t-l) 



c(r-l)U-l) 



R x T\CD 

One-way by one-way 

interaction, 

given CD 



2%jjkiPijki In 



PUkl 



Pijk.p.jkilp.jk. 



cd(r-l)(t-l) 



Form shown is the second-order interaction after first iteration. 



3.3. Logarithmic Additivity and a General Interpretation of the No-Interaction Hypothesis 

In theorem 2.1 A, we give the p*-distribution for no second-order interaction as 

Pijk 

(3.11) In = In a t j -f- In hjk + In c/a 

TTijk 

where a,j, bjk and c/a are functions of the three two-way marginals. In this form, the logarithms of 
the cell probabilities representing the no second-order interaction hypothesis are seen to be the 
sum of a constant and the logarithms of contributions from each of the specified marginals. Simi- 
larly, for the test of the hypothesis of uniform distribution, we have 



(3.12) 



lnp* A .= ln 



1 
red 



and for the test of the hypothesis of three-way independence, 



(3.13) 



In pnk= In — ;+ In a,+ In b\-\- In c/, ■. 
red 



Hence, if we consider the difference in the logarithms of the cell probabilities between the inde- 
pendence hypothesis and the uniform distribution hypothesis as represented by a row effect, a 
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column effect, and a depth effect, then the striking similarities between our approach to contin- 
gency table analysis and the approach used in analysis of variance become immediately obvious. 
To begin with, both analyses deal with multifactor multiresponse data. In the linear hypothesis, 
an additive model is assumed; in the analysis of contingency tables, a logarithmic additive model 
is assumed. In both cases the appropriate test statistics can be obtained by minimizing the dis- 
crimination information [ch. 10, 11, Kullback 1959]. The residuals in the usual linear analysis 
represent the difference between the observed values and the values computed from the model 
using the estimated values of the parameters; the residuals in our analysis represent the differences 
between the observed cell frequencies and the cell frequencies ** computed under each par- 
ticular hypothesis. The main effects and the various interactions in the analysis of variance also 
find corresponding counterparts in our first-order, second-order, and higher-order interactions. 
Darroch [1961], Bhakpar [1961], Lindley [1964], and Mantel [1966] have all suggested some analogy 
between the two types of analyses with a view to simplifying the analysis of multidimensional 
contingency tables. We remark that the main difference between the two types of analysis is that 
the marginal restraints requirements in the contingency table analysis which necessitate the itera- 
tion procedure are not present in the analysis of variance. 

We may consider the complete sample table to contain all the "information" available from the 
particular experiment. In the process of analysis, we aim to express the sample table in a reduced 
number of parameters represented by the marginal totals as expressed in (3.11) to (3.13). In other 
words, we are interested in knowing how much of this total information is contained in a summary 
consisting of sets of marginal tables. 

If there is no first-order interaction, i.e., independence of all classifications, then all the infor- 
mation is contained in the first-order marginals in the sense that given these marginals, the complete 
table can be constructed to within sampling error. If the first-order interaction is significant, but 
there is no second-order interaction, then the set of two-way marginals will be required to sum- 
marize the data adequately. The use of two-way tables to summarize multiway classification data 
is a rather common practice, and the implied assumption is therefore "no second- and higher- 
order interactions." 

A direct consequence of this interpretation is that the analysis can be reduced to that of the set 
of marginal tables if there is no interaction of the same order. 

We remark that the set of marginal tables must be considered jointly for proper interpretation, 
and if one or more of these tables show significant interactions, the results of tests of the remaining 
tables could lead to erroneous conclusions. An example of such a case was given in Simpson [1951]. 
The above interpretation is not restricted to complete sets of marginals. If the p*-distribution 
computed from three out of the six two-way marginals in a four-way table is found to be "close 
enough" to the p-distribution by our test, the three two-way marginal tables could be considered as 
containing essentially all the information in the four-way table. The analysis can therefore be per- 
formed on these marginal tables and the complexity of the problem reduced. For example, the 
analysis given in table 3.3 for a four-way table may be reduced to that of one two-way and two three- 
way tables, and that in table 3.4 to that of three two-way tables and one three-way table, provided 
that the corresponding interactions are found to be of no significance. 

A useful by-product obtained as a result of our computer routine is that the set of residuals, 
x — x*, are computed for each interaction hypothesis. Inspection and analysis of these residuals 
may be used as an aid "in assessing the validity or appropriateness of the conventional analysis" 
as recommended by Anscombe and Tukey [1963], in view of the indefiniteness and complexity of 
objectives of statistical analysis of multiresponse data. 

The analysis of categorical data may well follow this general philosophy and take advantage 
of some of the developed techniques for the analysis of residuals. In fact, in a goodness-of-fit test, 
if the computed X 2 shows significance, we usually look at the larger discrepancies between the 
observed and expected values of the cell frequencies and seek for an explanation. However, this 
practice has been restricted mainly to one-way tables. A plausible reason for the lack of such study 
in higher-order tables could be that the computation of expected frequencies becomes complicated 
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and unmanageable. Consequently, the analysis is considered complete with a formal test of 
significance. 

With the iterative procedure now available for computing the expected frequencies for each 
particular hypothesis, we could examine these residuals for a number of interesting features, such 
as: 

(a) outliers, or errors in counting and recording, 

(b) the physical interpretation of departure from the particular hypothesis, 

(c) trend over categories in a classification, particularly if these categories are arranged in a 
natural sequence of order of magnitude [Cochran 1954, pp. 434—436], and 

(d) the agreement of these residuals, squared and weighted by the inverses of the expected 
frequencies, with the corresponding X 2 distribution. 

We have therefore computed and printed these residuals, x — x*, or normalized residuals in 
table forms as a by product of our computation procedures to facilitate visual examination. 

4. Computer Programs and Examples 

4.1. Computation and Iteration Programs 

The iterative computation process described above for the calculation of cell probabilities, 
or cell frequencies, representing that of no interaction when certain marginals are considered 
fixed, is ideally suited for electronic computer operation. A program in Fortran V has been pre- 
pared. for this purpose. 4 A brief description of this program is given below. 

(1) The program is written in double precision mode for the computation of quantities of the 
form 2Xx In x. These quantities are useful in testing certain hypotheses as illustrated in Kullback, 
Kupperman, and Ku [1962a, 1962b]. The quantity In is defined as zero. 

(2) Input cards are provided for the specification of: 

(a) dimension M of the table, and number of categories within each dimension, with 2 ^ M ^ 4, 
andrXcXcfX^ 10 4 . 

(b) maximum number of complete cycles of iterative computation, and the agreement desired 
between the original given marginals and the computed marginals. Tentatively the maximum num- 
ber of cycles is set at 20 and the agreement optionally at 0.100, 0.010, and 0.001. 

(c) the choice of the set of marginals if these marginals are not a complete set of one-, two-, or 
three-way marginals. Iterative computation for the complete sets of marginals is automatically 
performed. 

(3) The data cards for the table are read in by column within each row, row X column within 
each depth, and row X column X depth within each level. Title cards for each of the classifications 
are provided. 

(4) The following notations are used in the output 
X(IJKL) original data 

Y(IJKL) cell frequencies corresponding to no first-order interaction. 

Z(IJKL) cell frequencies corresponding to no second-order interaction. 

W(IJKL) cell frequencies corresponding to no third-order interaction. 

UKIJKD1 n , . ,. t .. , . 1 

tt tttzt \ cell frequencies corresponding to specified marginals. 
Uz(lJKL)J 

(5) Outputs of the program for a four-way table are in the order listed below. For two- and 
three-way tables, the input cards will adjust the outputs accordingly. 

(a) Titles of classifications. 

(b) Original table X(IJKL) in the form of two-way tables. 

(c) All marginal three-way, two-way, one-way tables and the grand total. 

(d) All 16 sums of quantities of the form 2XX In X. 



1 We arc indebted to Mrs. Ruth Vainer. Statistical Engineering Laboratory, National Bureau of Standards, lor the preparation of this progra 
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(e) Number of complete cycles of iterations performed and the decimal agreement between 
marginals. 

(f) Tables of normalized residuals- {X(IJKL)- Y(IJKL)}/VY(IJKL) 

22 y In y 
first-order interaction = 2^X \nX — 2^Y In Y 

chi., q „a,ed = S^. 

(g) Print-outs under (f) are repeated for Z, and W, and for Ul, U2 when specified. 
Samples of the output are shown in the appendix for the four-way contingency tables used in 

examples 1 and 2. 



4.2. Examples 

In the literature there are a number of "classical" examples which have been used to demon- 
strate tests of no second-order interaction in three-way tables. These examples are collected and 
listed in table 4.1 where the values of the m.d.i.s. for no second-order interaction are compared with 
results obtained by other investigators. A number of interesting features are noted. 

(1) The maximum number of complete cycles of iteration used was 10 for the 2 X 2 X 12 table 
due to Snedecor. For the others, 6 to 7 cycles are sufficient for agreement of specified marginals 
to the third decimal place. 

(2) The values of 21 for no second-order interaction agrees very well with the values of X' 2 
computed through the solutions of systems of simultaneous equations of third degree, i.e., solutions 
with 2-way marginals considered as fixed. Solutions based on unrestricted maximum likelihood 
estimates are, however, somewhat lower than our values. 

(3) None of the second-order interactions computed reached the 5-percent level of signifi- 
cance. By the interpretation given in subsection 3.3, conclusions drawn from analysis of the 
three 2-way marginal tables are valid for the 3-way table. 

EXAMPLE 1. Ries and Smith [1963] reported an experiment comparing two detergents, a new prod- 
uct X and a standard product M. The three classifications were water softness, at three levels, 
temperature, at two levels, and a factor corresponding to previous experience and no previous 
experience with detergent M. This isa2X2X2X3 experiment with 

R: preference 

C: water temperature 

D: previous use 



T: water softness 



Ries and Smith used a series of chi-squared tests in their paper; Cox and Lauh [1967J reexamined 
the data recently employing a graphical approach. The data and computations are shown in the 
print-out sample A in the appendix. 
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i=l 


X 


2 


M 


7=1 


low 


2 


high 


k=\ 


nonuser 


2 


user 


1=1 


hard 


2 


medium 


3 


soft 



TABLE 4.1. Comparison of results — second-order interaction in three-way tables 





Value of 2/ with marginal < 


agreement to 








Example 












Other results 




0.10 


0.01 


0.001 




Bartlett [1935] 


2.357(4) a 


2.296(6) 


2.294(7) 


2.27 


X 2 


Bartlett. 


2X2X2 








2.298 

1.93 

2.26 

2.27 
8.89 
0.083 
7.603 


21 

X 2 

yz 
Z 2 

Y 2 

2/ 


Computed from Bartlett's solution. 

Goodman [1964b]. 

Goodman [1964b]. 

Goodman [1964b]. 

Koch [1968]. 

Koch [1968] different model. 

Kullback [1959, without iteration]. 


Kastenbaum and Lamphiear [1959] 


3.1838(2) 


3.1592(4) 




3.158 


X 2 


Kastenbaum [1959] Darroch [1962]. 


5X3X2 


3.1660(4) 


3.1600(5) 




3.640 


21 


KKK [1962b, without iteration]. 


3X2X5 


3.1609(3) 


3.1591(4) 


3.1588(7) 


3.13 


Y 2 


Goodman [1964b]. 


2X3X5 








2.8 

3.12 


Y 2 
X 2 


Goodman [1964b]. 
Plackett [19621. 


Snedecor, as quoted on p. 184, Kullback [1959] 




7.7157(10) 




12.608 


2/ 


Kullback (without iteration). 


2X2X12 








15.492 
7.59 
7.45 

7.37 
5.18 


2/ 
X 2 
X 2 
Y 2 
Z 2 


Kullback (without iteration). 
Norton [1945]. 
Goodman [1964b]. 
Goodman [1964b]. 
Goodman [1964b]. 


Kullback [1959], prob. 13.10, p. 188 




0.007(5) 









Theoretical. 


2X2X2 














Kullback [1959], table 12.2, p. 180 




7.584(2) 




7.570 


2/ 


Kullback (algebraic). 


2X4X2 














Kilkberg, Narragon and Campbell [1964] 






0.0704(5) 


0.071 


X} 


Koch [1968]. 


Bhapkar and Koch 






0.0430(6) 


0.0435 


X 2 e 


Koch [1968]. 


2X2X2 






3.5710(6) 


3.3917 


xi 


Koch [1968]. 


Schotz [1966] 






10.511(7) 


7.22 


xi 


Koch [1968, using index of order 


Bhapkar and Koch 












association]. 


2X2X4 















00 



a Number of complete cycles of iteration. 



The analysis of information table corresponding to table 3.1 is shown in table 4.2 below. 

Table 4.2 



Components of information 


Information 


d.f. 


Third-order interaction 


0.739 
9.108 


2 

7 


Second-order interaction 
2nHpfipT) 


9.847 
33.081 


9 
9 


Four-way independence 


42.928 


18 



Neither the third-order nor the second-order interactions reached significance at a = 0.10. 
Hence we conclude that the analysis of the six two-way tables will yield the desired information. 
The numerical values of these six interactions are computed in two ways for comparison in table 
4.3. The first set is computed directly from the six two-way tables. The second set is computed 
by using the analysis 2nl(pf :pf ) =2XijXtj.. In ay+. . . + 22*&Y..jm In f kf . The sum of the first 
set, 33.763, should equal the component 2nl(p* :pf) in table 4.2. The difference between the 
two sums represents the effect of the marginal restraints. 



Table 4.3 



Components of information 


Information 


d.f. 


Preference and water temp. (R X C) 


4.361 
20.581 
.395 
1.252 
6.099 
1.075 


4.393 

19.920 

.424 

1.314 

6.089 

.943 


1 


Preference and previous use (R X D) 


1 


Preference and water softness {R X T) 


2 


Water temp, and previous use (C X D) 


1 


Water temp, and water softness (C X T) 


2 


Previous use and water softness (D X T) 


2 








33.763 


33.083 


9 



The main conclusion here is that preference is highly dependent on previous use, and to a 
certain extent dependent on the water temperature. The water temperature effect depends some- 
what on degree of softness of water. The nonsignificance of (C X D) and {D X T) shows that the 
samples of previous user and nonprevious user of M are not biased with respect to water tem- 
perature and water softness. 

We also include here analysis of information tables 4.4 and 4.5 corresponding to tables 3.3 
and 3.4 respectively. 
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Table 4.4 



Components 


Information 


d.f. 


(CTXR)(D) 
(CT, R)(D) 


8.059 
4.185 


5 
5 


CTXR\D 


12.244 


10 



Table 4.5 



Components 


Information 


d.f. 


(RxCxT)(D) 
(R, C, T)(D) 


9.725 
10.294 


7 

7 


RxCxT\D 


20.019 


14 



Since none of these components of information reached significance at a = 0.10, we conclude 
that the interaction between preference and water temperature-water softness are not different 
for the previous user and nonuser groups. If this conclusion is accepted, then separate analyses 
of previous user group and nonuser group appear to be unnecessary. 



Example 2. 

For the second sample we use the survey results as reported by Hoyt, Krishnaiah, and Tor- 
rance [1959], analyzed also in Kullback, Kupperman and Ku [1962a] for nine hypotheses of in- 
dependence and conditional independence. The four classifications are: 

Categories 
D: high school ranks 3 

C: post high school status 4 

T: sex 2 

R: father's occupational level 7 



The data (sample B, appendix) showed considerable heterogeneity and all the nine hypotheses 
tested in the above analysis gave highly significant results. We continue the analysis in table 4.6. 
All the interactions are again highly significant excepting the third-order interaction for which 
p = 0.15. 

Since the second-order interaction is highly significant, we analyze the difference between 
second-order and third-order interaction into its component parts in table 4.7, the second set in 
accordance with the second part of table 3.2. These four components represent second-order 
interactions in the four three-way tables subject to the three-way marginal restraints. The first 
set corresponds to the analysis 2/i/(p*:p*) =2X ijk X uk . In a ijk + . . . -\-2XjkiX.j k i In d jkt . 

We note that all the second-order interactions are significant when the R classification is 
involved, i.e., the interactions CxD,CxT, and DxT are different for different occupational levels 
of fathers' occupations. These results, and the fact that there is an unusually larger number of 
girls than boys for the third level of fathers' occupation as shown in table for X(I**L), suggest that 
the counts for this level may be suspect. 
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Table 4.6 



Components of information 


Information 


d.f. 


2nl(p:p*) 

2nl(p*:p*) 


44.793 
127.464 


36 

72 


2nI( P :pt) 
2nl(p*:p*) 


172.257 
3320.858 


108 

47 


2nI( P :pf) 


3493.115 


155 



Table 4.7 



Marginal restraints added 


Information 


d.f. 


All two-way marginals 


172.257 


172.257 


108 


RCD 


53.841 


52.267 


36 






119.990 


72 


RCT 


45.161 


44.630 


18 






75.360 


54 


RDT 


25.477 


27.588 


12 






47.772 


42 


CDT 


2.985 


2.979 


6 


All three-way marginals 


44.793 


44.793 


36 



Table 4.8 



All two-way marginals 




109.521 


91 


RCD 




29.521 
80.000 


30 
61 


RCT 




27.738 
52.262 


15 
46 


RDT 




11.986 
40.276 


10 
36 


CDT 

All three-way marginals 




.673 
39.603 


6 
30 
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Table 4.8 is an analysis of the data with the third level of fathers' occupation deleted. The 
second-order interaction is still significant at a=0.10, but not at a^O.05. Of the components of 
second-order interaction, however, only RCT remains significant. The interpretation that the 
interactions between post high school status and sex are different for different levels of fathers' 
occupational level appears to be a reasonable one. 

We would suggest, therefore, that the data for the third level of fathers' occupation be re- 
checked. 

5. Summary 

Using the basic notions of information theory, we have developed in the above sections a 
unified approach to the analysis of multiway contingency tables. Under this approach the prin- 
ciple of minimum discrimination information is proposed and used to generate hypotheses of 
interest. It is shown that all classical hypotheses for contingency tables can be generated through 
the use of this principle when certain marginals are considered fixed. 

For each set of fixed marginals, a unique set of cell probabilities {p*} is generated by min- 
imizing the discrimination information. The set of {p*} corresponds to the cell probabilities rep- 
resenting no-interaction, and typically can be expressed in a logarithmic linear form: 

In pfj k = const. -+■ In ay + In bjk •+ In c«t, 

where aij, bjk, and c/a are functions of cell probabilities of the corresponding fixed two-way mar- 
ginal tables. The difference between the set of cell probabilities estimated from data and {p*} is 
therefore a measure of interaction. 

If the complete set of one-way marginals are considered fixed, the set of {/>*} represents 
cell probabilities under the independence hypothesis. If the complete set of two-way marginals 
are considered fixed, the set of {/>*} are the cell probabilities representing no second-order inter- 
action. In this sense the higher-order no-interaction hypotheses can be considered as hypotheses 
of "generalized" independence, a concept which unifies the many attempts in the formulation of 
second-order interaction described in brief in the introductory section. 

The relationship between minimum discrimination information and maximum entropy is 
examined and the analogy between the proposed analysis and the analysis of variance using least 
squares theory is noted. An interpretation of the no-interaction hypothesis as equivalent to that 
"the given marginal tables are sufficient and contain all the information of the full table" reduces 
the dimension of the table, and hence also the complexity of the analysis. 

The expression for p* for given marginals is given in theorem 2.1 and the convergence of the 
iterative computation.procedure to the unique set of {/>*} in theorem 2.2. 

Analysis of information tables for four-way tables are given for first-, second-, and third-order 
interactions, and also for selected mixed-order interactions. A Fortran program to aid in the 
computation has been prepared. 

Two illustrative examples in the analysis of four-way tables are included. 



6. Appendix. Samples of Selected rortions of Computer Print-Out 

SAMPLE A EXAMPLE 1 (Tables of residuals suppressed) 

R PREFERENCE M OR X 1=1,2 

C WATER TEMPERATURE J= 1, 2 

D PREVIOUS USER OR NONUSER OF M K= 1, 2 

T WATER SOFTNESS L=l, 3 
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312-532 O - 68 - 3 



ORIGINAL TABLES 



X(IJ11) 








X(IJ12) 




X(IJ13) 




68 


42 








66 33 




63 29 




42 


30 








50 23 




53 27 




XUJ21) 








X(U22) 




X(IJ23) 




37 


24 








47 23 




57 19 




52 


43 








55 47 




49 29 












MARGINAL TABLES 
















THREE-WAY TABLES 








X(U1*) 






X(*J1L) 




X(1J*L) 




X(I*1L) 




197 104 


] 


[10 


116 


116 


105 113 


120 


110 99 


92 


145 80 




72 


56 


56 


66 56 


48 


72 73 


80 


X(IJ2*) 






X(*J2L) 




X(2J*L) 




X(I*2L) 




141 66 




89 


102 


106 


94 105 


102 


61 70 


76 


156 119 




67 


70 


48 


73 70 


56 


95 102 


78 



X(U**) 
338 170 
301 199 

X(*JK*) 
342 297 
184 185 



TWO-WAY TABLES 

X(I*K*) 
301 207 

225 275 

X(*J*L) 
199 218 222 
139 126 104 

ONE-WAY TABLES 



X(I***) 
508 500 



X(*J**) 
639 369 



X(**K*) 
526 482 



X(I**L) 

171 169 168 

167 175 158 

X(**KL) 

182 172 172 

156 172 154 



X(***L) 
338 344 326 



TOTAL 

1008 
PRINT OF SUMS 

SUM 2X(IJKL)LNX(IJKL)= .7653768886020237- 
SUM 2X(IJK*)LNX(IJK*)= .9853563803650491 H 

SUM 2X(*JKL)LNX(*JKL)= .9018264486617197 
SUM 2X(IJ*L)LNX(IJ*L) = .9017229799353486-+ 

SUM 2X(I*KL)LNX(I*KL)= .8962284823389048- 



004 
004 
-004 
004 
-004 
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SUM 2X(IJ**)LNX(IJ**)=. 1122496619356278 +005 

SUM 2X(I*K*)LNX(I*K*)= .1 1 16989524449928+005 

SUM 2X(I**L)LNX(I**L) = .1033087165370435 + 005 

SUM 2X(*JK*)LNX(*JK*)= .1 122371544867871 + 005 

SUM 2X(*J*L)LNX(*J*L)=. 1040972402615119 + 005 

SUM 2X(**KL)LNX(**KL)= .1033340920997269 + 005 

SUM 2XfI***)LNX(l***)=. 1254477724916192+ 005 

SUM 2X(*J**)LNX(*J**)=. 1261792581599639 + 005 

SUM 2X(**K*)LNX(**K*)= .1254663500174484 + 005 

SUM 2X(***L)LNX(***L)= .1 172779757827051 + 005 

2N LN N=. 1394209847244072 + 005 

2Y LN Y=. 7610840227851502 + 004 

FIRST-ORDER INTERACTION = .4292865816 + 002 

CHI-SQUARED = .4390224840+002 

2Z LN Z= .7643922243020641 + 004 

SECOND-ORDER INTERACTION = .9846642999 + 001 

CHI-SQUARED= .9870614978+001 

2W LN W=. 76530296703551 77 + 004 

THIRD-ORDER INTERACTION = .7392156650 + 000 

CHI-SQUARED= .7379092751 + 000 

SPECIFIED MARGINALS I*K* *JKL 
NO. OF ITERATIONS=l CYCLE 
AGREEMENT BETWEEN MARGINALS TO .100-01 

2(U1) LN (Ul)=. 7641524729371637 +004 

INTERACTIONS (Ul)= .1224415664 + 002 

CHI-SQUARED= .1220141783 + 002 

SPECIFIED MARGINALS I*K* *JKL IJ*L 

NO. OF ITERATIONS = 3 CYCLES 
AGREEMENT BETWEEN MARGINALS TO .100-01 

2(U2) LN (U2)=. 7645709639307871 + 004 

INTERACTIONS (U2)= .8059246712 + 001 

CHI-SQUARED = .8054468429+001 

SPECIFIED MARGINALS I*K* *JK* **KL 

NO. OF ITERATIONS =1 CYCLE 
AGREEMENT BETWEEN MARGINALS TO .100-01 
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2(U3) LN (U3)=. 7633749899660993 + 004 

INTERACTIONS (U3)= .2001898635 + 002 

CHI-SQUARED. = .2069489400+002 



SPECIFIED MARGINALS I*K* *JK* **KL 
NO. OF ITERATIONS = 3 CYCLES 
AGREEMENT BETWEEN MARGINALS TO .100-01 

2(U4) LN (U4)=. 7644043799921927 + 004 

INTERACTIONS (U4)= .9725086098 + 001 

CHI-SQUARED = .9720352364 + 001 



IJ*L 



SAMPLE B EXAMPLE 2 (Mixed order interactions not shown) 

R FATHER OCCUPATIONAL LEVEL 1=1,2,3,4,5,6,7 

C POST HIGH SCHOOL STATUS J = 1, 2, 3, 4 

D HIGH SCHOOL RANKS K=l, 2, 3 

T SEX L=l, 2 



ORIGINAL TABLES 





X(IJ11) 






X(1J21) 






X(IJ31) 




87 


3 


17 


105 


216 


4 


14 


118 


256 


2 


10 


53 


72 


6 


18 


209 


159 


14 


28 


227 


176 


8 


22 


95 


52 


17 


14 


541 


119 


13 


44 


578 


119 


10 


33 


257 


88 


9 


14 


328 


158 


15 


36 


304 


144 


12 


20 


115 


32 


1 


12 


124 


43 


5 


7 


119 


42 


2 


7 


56 


14 


2 


5 


148 


24 


6 


15 


131 


24 


2 


4 


61 


20 


3 


4 


109 


41 


5 


13 


88 


32 


2 


4 


41 




X(IJ12) 






X(IJ22) 






X(U32) 




53 


7 


13 


76 


163 


30 


28 


118 


309 


17 


38 


89 


36 


16 


11 


111 


116 


41 


53 


214 


225 


49 


68 


210 


52 


28 


49 


521 


162 


64 


129 


708 


243 


79 


184 


448 


48 


18 


29 


191 


130 


47 


62 


305 


237 


57 


63 


219 


12 


5 


10 


101 


35 


11 


37 


152 


72 


20 


21 


95 


9 


1 


15 


130 


19 


13 


22 


174 


42 


10 


19 


105 


3 


1 


6 


88 


25 


9 


15 


158 


36 


14 


19 


93 



MARGINAL TABLES 



THREE-WAY TABLES 





X(1J1*) 






X(IJ2 


*) 






X(IJ3*) 




140 


10 


30 


181 


379 


34 


42 


236 


565 


19 


48 


142 


108 


22 


29 


320 


275 


55 


81 


441 


401 


57 


90 


305 


104 


45 


63 


1062 


281 


77 


173 


1286 


362 


89 


217 


705 


136 


27 


43 


519 


288 


62 


98 


609 


381 


69 


83 


334 


44 


6 


22 


225 


78 


16 


44 


271 


114 


22 


28 


151 


23 


3 


20 


278 


43 


19 


37 


305 


66 


12 


23 


166 


23 


4 


10 


197 


66 


14 


28 


246 


68 


16 


23 


134 
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THREE-WAY TABLES -Continued 





X(*J1L) 




X(*J2L) 


X(*J3L) 






365 


213 




760 


650 


793 


1164 






41 


76 




62 


215 


38 


246 






84 


133 




157 


346 


100 


412 






1564 


1218 




1565 


1829 


678 


1259 






X(1J*L) 




X(2J 


*L) 


X(3J 


*L) 






559 


525 




407 


377 


290 


457 






9 


54 




28 


106 


40 


171 






41 


79 




68 


132 


91 


362 






276 


283 




531 


535 


1376 


1677 




X(4J 


*L) 




X(5J 


*L) 


X(6J*L) 


X(7J*L) 


390 


415 


117 


119 


62 


70 


93 


64 


36 


122 




8 


36 


10 


24 


10 


24 


70 


154 




26 


68 


24 


56 


21 


40 


747 


715 


299 


348 


340 


409 


238 


339 




X(P 


1L) 




X(I*2L) 


X(I*3L) 






212 


149 




352 


339 


321 


453 






305 


174 




428 


424 


301 


552 






624 


650 




754 


1063 


419 


954 






439 


286 




513 


544 


291 


576 






169 


128 




174 


235 


107 


208 






169 


155 




176 


228 


91 


176 






136 


98 




147 


207 


79 


162 





TWO-WAY TABLES 
XflJ**) X(I*K*) X(I**L) 



1084 


63 


120 


559 


361 


691 


774 




885 


941 


784 


134 


200 


1066 


479 


852 


853 




1034 


1150 


747 


211 


453 


3053 


1274 


1817 


1373 




1797 


2667 


805 


158 


224 


1462 


725 


1057 


867 




1243 


1406 


236 


44 


94 


647 


297 


409 


315 




450 


571 


132 


34 


80 


749 


324 


404 


267 




436 


559 


157 


34 


61 


577 


234 


354 


241 




362 


467 




X(*JK*) 




X(*J*L) 




Xf 


**KL) 






578 


1410 


1957 


1918 


2027 


2054 


1640 






117 


277 


284 


141 


537 


2544 


3040 






217 


503 


512 


341 


891 


1609 


3081 






2782 


3394 


1937 


3807 


4306 
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ONE-WAY TABLES 

X(I***) 
1826 2184 4464 2649 1021 995 829 

X(*J**) 
3945 678 1232 8113 

X(**K*) 
3694 5584 4690 

X(***L) 
6207 7761 

TOTAL 
13968 



PRINT OF SUMS 
SUM 2X(IJKL)LNX(IJKL)= .1419490011041468 + 006 

SUM 2X(IJK*)LNX(IJK*) = .1602167221282754 + 006 
SUM 2X(*JKL)LNX(*JKL)= .1900742436081673 + 006 

SUM 2X(IJ*L)LNX(IJ*L)= .1706458462114764 + 006 
SUM 2X(I*KL)LNX(I*KL)= .1680009679212620+ 006 

SUM 2X(IJ**)LNX(IJ**)= .1893749331578995 + 006 
SUM 2X(I*K*)LNX(I*K*) = .1866950003995916 + 006 
SUM 2X(I**L)LNX(I**L)=. 1976933167165831 + 006 
SUM 2X(*JK*)LNX(*JK*)=. 2085192406453136 + 006 
SUM 2X(*J*L)LNX(*J*L)= .2189304551608272 + 006 
SUM 2X(**KL)LNX(**KL)= .2175298383613909 + 006 

SUM 2X(I***)LNX(I***) = .2168255594020366 + 006 

SUM 2X(*J**)LNX(*J**) - .2377594145686747 + 006 

SUM 2X(**K*)LNX(**K*)=. 2363330848251296 + 006 

SUM 2X(***L)LNX(***L)= .2474453182056570 + 006 

2N LN N=. 2666358302324256 +006 
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TABLE OF NORMALIZED RESIDUALS 







Rdjn 


) 




R(IJ12) 




-1.162 


-.213 


2.938 


-.481 


.155 


.113 


.552 


.616 




.671 


-.308 


1.295 


.579 


.277 


.769 


-1.494 


-1.404 




-1.619 


2.262 


-2.707 


-.476 


.664 


.178 


-.408 


1.308 




.484 


-.156 


-.868 


.881 


.254 


-.385 


.913 


-1.403 




1.779 


-.916 


1.954 


-.627 


-.734 


-.323 


-.278 


-.060 




.116 


.062 


-.404 


-.499 


.007 


-1.722 


1.610 


.478 




1.519 


.996 


-.029 


.108 


-2.067 


-1.552 


-.246 


.216 






R(IJ21 


) 






R(IJ22) 





















.155 


-.947 


- .548 


-.152 


-.434 


2.540 


-.713 


.244 


.523 


.119 


.212 


-.251 


-.195 


.249 


.112 


-.300 


-.835 


-.963 


-.796 


1.344 


1.829 


-.673 


-.580 


-.803 


-.201 


-.207 


1.106 


.181 


-.277 


-.329 


-.032 


-.003 


-.139 


.523 


- 1 .305 


-.402 


-1.039 


-.831 


1.893 


.728 


-.091 


1.735 


1.771 


-.699 


-1.064 


.552 


-.345 


.408 


1.925 


1.042 


1.940 


-2.536 


-.863 


-.754 


-.857 


1.736 


R(U31) 


RUJ32) 


1.220 


-1.218 


-.599 


-.225 


-.281 


-1.242 


-.326 


.060 


-.253 


-.737 


.162 


-1.584 


-.527 


-.227 


-.141 


2.448 


-.889 


-.329 


-.084 


2.053 


.416 


.302 


2.438 


-2.597 


-1.224 


.555 


.004 


-.722 


1.011 


.485 


-.801 


.675 


-.190 


-.337 


-.198 


.842 


.373 


1.238 


-1.661 


-.247 


.248 


.134 


-.808 


1.027 


.659 


-.404 


-1.182 


-.355 


.800 


.092 


-.319 


-.359 


-1.417 


.724 


-.063 


.653 



2Z LN Z = . 1417767440958978 + 006 CHI-SQUARED= .1724668778 + 003 

SECOND-ORDER INTERACTION = .1722570082 + 003 
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TABLE OF NORMALIZED RESIDUALS 



R(IJ11) 


RUJ12) 


-.384 


.383 


1.070 


-.079 


.520 


-.216 


-.943 


.094 


-.524 


-.937 


.835 


.290 


.818 


.763 


-.852 


-.387 


.291 


1.084 


-.593 


-.153 


-.280 


-.694 


.352 


.157 


.018 


-.598 


-1.085 


.363 


-.024 


.495 


.952 


-.463 


.620 


-.773 


1.245 


-.509 


-.848 


.573 


-.978 


.594 


.143 


.541 


-.863 


.097 


-.170 


-.495 


.656 


-.103 


.513 


.491 


-.550 


-.156 


-.953 


-.560 


.579 


.177 




R(U21) 








RUJ22) 




.023 


-.422 


-.429 


.214 


-.026 


.174 


.332 


-.210 


-.278 


.430 


-.736 


.417 


.334 


-.233 


.598 


-.418 


-.150 


-.600 


.541 


.023 


.131 


.299 


-.300 


-.021 


.103 


-.140 


.297 


-.141 


-.113 


.081 


-.218 


.142 


.316 


1.063 


-1.518 


.132 


-.333 


-.523 


.947 


-.115 


.379 


-.037 


.937 


-.421 


-.392 


.025 


-.640 


.378 


-.038 


.252 


.879 


-.324 


.049 


-.172 


-.665 


.249 


R(IJ31) 


R(IJ32) 


.208 


.267 


-.623 


-.201 


-.188 


-.082 


.363 


.158 


.628 


.484 


.210 


-1.009 


-.532 


-.177 


-.116 


.732 


-.037 


-.462 


-.179 


.190 


.026 


.179 


.077 


-.142 


-.121 


.824 


.716 


-.369 


.095 


-.328 


-.364 


.275 


-.777 


-.452 


.750 


.608 


.655 


.171 


-.362 


-.439 


-.451 


-.339 


-.398 


.493 


.367 


.176 


.207 


-.358 


-.329 


-.703 


-.656 


.786 


.329 


.356 


.370 


-.479 



2W LN W=. 14 19042077670569 + 006 CHI-SQUARED = .4418652549 + 002 

THIRD-ORDER INTERACTION = .4479333708+002 
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